MapReduce/Bigtable for Distributed Optimization
نویسندگان
چکیده
With large data sets, it can be time consuming to run gradient based optimization, for example to minimize the log-likelihood for maximum entropy models. Distributed methods are therefore appealing and a number of distributed gradient optimization strategies have been proposed including: distributed gradient, asynchronous updates, and iterative parameter mixtures. In this paper, we evaluate these various strategies with regards to their accuracy and speed over MapReduce/Bigtable and discuss techniques and configurations needed for high performance.
منابع مشابه
Motivating a Distributed System of Commodity Machines1
This report examines the price/performance benefit of using a large cluster of commodity machines rather than server level hardware for certain large scale software applications. A number of tools are presented which make it easier to produce software that runs across large clusters of commodity machines. These tools are the Chubby locking service, the Google file system, MapReduce and BigTable...
متن کاملHbase - non SQL Database, Performances Evaluation
HBase is the open source version of BigTable distributed storage system developed by Google for the management of large volume of structured data. HBase emulates most of the functionalities provided by BigTable. Like most non SQL database systems, HBase is written in Java. The current work’s purpose is to evaluate the performances of the HBase implementation in comparison with SQL database, and...
متن کاملFast Multi-fields Query Processing in Bigtable Based Cloud Systems
With the rapid increase of data sizes, enterprise applications are migrating their backend data management and analytic systems into cloud based data management systems.Bigtable is among one of the major data models used by cloud storage systems as their storage layer. Such systems provide high scalability and schema flexibility, and support efficient point and range based queries based on rowk...
متن کاملDISTRIBUTED SYSTEMS B534 SURVEY PAPER The Chubby Lock Service
This is a survey paper written for the class B534, Distributed Systems. The purpose of this paper is to encourage us to learn new things about how distributed systems come to work in reality and how they are actually evaluated and applied. The topic which is to be presented in this paper is the Chubby Lock service that is implemented by Google and is part of Google Labs. An important point to t...
متن کاملHow Big Hadoop Clusters Break in the Real World
Hadoop is among today’s most widely deployed “big data” systems. Cloudera is a company offering paid Hadoop services and support. This poster abstract describes lessons from examining a sample of 293 support tickets, from February through July of 2011. We manually labelled the tickets in our sample with the established root cause and the specific system component being worked on. Tickets cover ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010